CMAJ Open
● CMA Impact Inc.
Preprints posted in the last 7 days, ranked by how well they match CMAJ Open's content profile, based on 12 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Timilshina, N.; Jacobson, D.; Birze, A.; Wodchis, W. P.; Kuluski, K.; Strumpf, E.; Ammi, M.
Show abstract
Introduction The COVID-19 pandemic profoundly disrupted healthcare delivery worldwide, with cancer care among the most affected services. Prior studies documented delays in referrals, reduced specialist access, and increased provider burden. However, the extent to which these experiences were reflected at the system level remains unclear. Objective To document cancer care experiences and examine whether these experiences were reflected in population-level health system indicators across Ontario, Canada. Methods We used an exploratory sequential mixed-methods design. Qualitative data were collected through focus groups and semi-structured interviews with 32 participants, including patients with cancer (n=8), caregivers (n=5), healthcare providers (n=14), and decision-makers (n=5) across two hospital settings in Ontario, Canada. Emergent themes informed the development of quantitative indicators. We then conducted a retrospective population-based analysis of linked administrative health databases for cancer patients in Ontario (n=87,786) to assess the prevalence of identified themes. Results Four themes emerged: (I) delays in diagnosis and screening; (II) disrupted access to primary care; (III) barriers to specialist and mental health services; and (IV) fragmented care for patients with multimorbidity. Quantitative findings corroborated major themes. Screening rates declined for cervical (64.8% to 57.5%) and breast cancer (64.5% to 57.2%). While in-person primary care shifted almost entirely to virtual modalities (8.5% to 95.4%), overall visit volumes remained stable. Specialist care showed uneven patterns, with increased oncology visits but declines in cardiology and mental health services. Patients with multiple comorbidities experienced the largest reductions in non-oncology specialist care. Conclusion The pandemic disrupted key components of cancer care, particularly screening, access to certain specialist services, and care for patients with complex needs. Integrating qualitative and quantitative evidence highlights areas of system vulnerability and underscores the need for coordinated, resilient cancer care capable of maintaining essential services during future crises.
Jones, L.; Ergas, R.; Tibbs, A.; Russo, E. T.; Norville, J.; Bingay, B.; Brown, C. M.; Reich, N. G.; Pasco, R.
Show abstract
Background Pediatric immunizations for Respiratory Syncytial Virus (RSV), including monoclonal antibodies for infants and vaccines for pregnant people, have become broadly available and can prevent severe RSV outcomes in infants. However, quantifying the impact of RSV immunization in prevention of severe pediatric illness at the population-level is limited by lack of RSV case surveillance data. The Massachusetts Department of Public Health (DPH) conducted a modeling analysis using routine public health surveillance data to estimate the state-level impact of new RSV immunization products on Emergency Department (ED) visits and hospitalizations in Massachusetts for highest risk pediatric groups. Methods A scenario projection tool, called R.Scenario.Vax, was utilized to simulate RSV-associated ED hospital encounters by age group in the context of newly available immunizations. ED visit and hospitalization data from the National Syndromic Surveillance Program (NSSP) during the time period 10/08/2017--10/19/2024 were analyzed, scaled to account for changes in RSV testing practices over time and missing encounter volume in historic data, and utilized to inform model fit of a "typical" RSV season. RSV immunization data from the Massachusetts Immunization Information System (MIIS) for the 2023--2024 and 2024--2025 RSV seasons informed high and moderate pediatric RSV immunization coverage scenarios and their impact was compared to a counterfactual reference scenario of no new immunizations. Median projections were quantitatively and qualitatively compared to observed 2024--2025 season data. Percent reduction in hospital encounters and encounters averted per 10,000 population were calculated for each scenario as compared to the reference. Results Projections for the youngest at-risk age groups showed significantly lower RSV-associated ED visits and hospitalizations during the 2024--2025 season for both high and moderate immunization coverage scenarios. Median projections for infants under 6 months old in the highest coverage scenario, wherein nearly all infants were immunized, showed 72.6% lower ED visits and 73.4% lower hospitalizations when compared to the reference scenario, equating to 262 ED visits and 85 hospitalizations averted per 10,000 population. Conclusions Our results support the use of modeling methods for public health insights and suggest that RSV immunizations for infant populations result in significantly lower RSV-related ED encounters in Massachusetts.
Jayaprakash, A.; Liberati, E.; Lindsay, R.; Willars, J.; Gibson, J.; Fritz, Z.; Price, A.; Hatfield, T.; Richards, N.; Martin, G.
Show abstract
Objectives People with mental health conditions experience increased rates of diagnostic errors and delays in acute treatment. While causes such as diagnostic overshadowing (misattribution of physical symptoms to mental health conditions) are well documented, less attention has been paid to the organisational and structural conditions that shape diagnostic work. This study examines how physical illness is diagnosed in patients with mental health conditions in emergency departments (EDs), with a focus on the structural conditions that enable or constrain safe diagnostic practice. Method We conducted a multi-site ethnography across three purposively selected EDs in England between April 2023 and April 2024, varying in size, population demographics, and local service configuration. Data were collected through 284 hours of non-participant observation and 20 semi-structured interviews with ED staff. Results Our analysis identified four recurring structural gaps that shaped the conditions under which physical health diagnosis took place for patients with mental health conditions: a design gap, whereby targets and physical layouts constrained diagnostic reasoning; a preparedness gap, reflecting the lack of structural support to allow staff to act on their existing knowledge and skills; a coordination gap, reflecting fragmented ownership and the challenges of joint assessment across mental and physical healthcare teams; and an expectation gap, whereby unmet need elsewhere in the system increased demand for ED services that were beyond its formal scope. These gaps made diagnostic errors and delay more likely for patients with mental health conditions seeking physical healthcare in the ED. Conclusions As new dedicated mental health EDs are introduced in England, there is an opportunity to avoid reproducing these structural gaps in new settings. Our study suggests that improving physical healthcare for patients with mental health conditions requires changes to how EDs are designed, resourced and supported, and how they connect with the wider health and care system. Keywords: mental health, diagnostic inequality, emergency departments
Khan, D. Z.; Mao, Z.; Wijekoon, A.; Das, A.; Williams, S. C.; Blandford, A.; Jain, A.; Harris, L.; Borg, A.; Dorward, N. L.; Clarkson, M.; Bano, S.; McCulloch, P.; Stoyanov, D.; Marcus, H.
Show abstract
Introduction: Precise anatomical navigation is fundamental to safe endoscopic pituitary surgery, a high-stakes procedure characterised by a challenging learning curve. While traditional navigation systems often rely on workflow-disrupting probes or static preoperative imaging, advancements in computer vision AI (CVAI) now enable dynamic, real-time anatomical segmentation directly from live surgical video1-3. Our group has previously conducted a series of preclinical human-computer interaction studies to refine the system's design, alongside digital and high-fidelity physical simulations demonstrating the benefit of AI assistance in improving overall performance, training, and safety4-8. Building on this foundation, the current study represents a first-in-human application of real-time CVAI assistance in the neurosurgical operating room, serving to assess feasibility and safety, and to iteratively improve the system. Method: Guided by DECIDE-AI and IDEAL frameworks, this single-centre evaluation comprises an initial proof-of-concept phase (n=6) for endoscopic transsphenoidal pituitary surgeries. The AI model utilised a DINOv3-derived vision transformer architecture, deployed via a high-performance edge computing unit to achieve low-latency, real-time inference without reliance on cloud infrastructure2. Given the high-risk nature of the procedure and the early stage of clinical AI integration, the system was initially deployed as an educational adjunct on a secondary monitor, ensuring the primary surgical feed remains uncompromised. Functionality and safety were assessed via structured questionnaire, prospective observation, and blinded retrospective review of the recordings of the endoscopic surgical video feed and wider operating room environment. Continuous multi-stakeholder feedback through validated human factors surveys drove iterative technical refinements between cases. Results: Six patients with pituitary adenomas were enrolled. The CVAI system was successfully deployed in four cases, demonstrating acceptable real-time sella segmentation accuracy. Deployment failed pre-operatively in two cases owing to a single recurring system reboot bug. Iterative refinement between cases were driven by our experience and surgical team feedback. This resulted in the integration of additional anatomical structure segmentations (e.g., carotid arteries), enhanced model accuracy via training dataset expansion, and hardware firmware upgrades. Multi-stakeholder surveys demonstrated satisfactory system feasibility, usability, and acceptability among the surgical team. Both prospective observation and retrospective video review confirmed the absence of adverse events, including no significant distraction to the primary surgeon, and there were no AI-related clinical complications. Conclusion: This first-in-human early clinical evaluation demonstrates the feasibility, safety and iterative development of real-time, CVAI-based anatomical navigation during high-stakes neurosurgery. Future work will include a larger single-centre case series (IDEAL Stage 2a) with more surgical teams to further iterate the system and explore its impact on training and workflow. As the underpinning technology improves, deployment will transition to direct intra-operative decision support and integration with other intra-operative navigational technologies.
Squire, K.
Show abstract
Background. The emergency department in the United States of America functions as a residual access point for healthcare and social services for populations including rural communities, the uninsured, mental health and addiction patients, and the unhoused. The workforce variable that determines unit function (experience density, the concentration of accumulated clinical judgment within a unit workforce) is not measured in hospital accounting systems. Objective. To document workforce composition changes in U.S. emergency nursing across the 2018 and 2022 cycles of the National Sample Survey of Registered Nurses (NSSRN), and to specify falsifiable predictions for the 2026 cycle. Methods. We analyzed NSSRN public-use files using a four-way ED definition extending Castner et al. (2024) and a hospital-bedside-restricted comparator. Variance estimation used jackknife replicate weights for 2018 and Successive Differences Replication for 2022. Burnout was operationalized using the Norful et al. (2023) leaving-reasons proxy across cycles, with sensitivity analysis using the 2022 direct burnout item. Results. A 15-year trajectory (2008-2022) documents progressive experience-density compression: the ED's 15+ year veteran cohort fell from 41.9% to 28.0% over the decade preceding the pandemic, a loss of nearly a third of the senior cohort and a 19.6% decline in mean experience density, before recovering modestly to 33.3% as veteran nurses remained through the pandemic acute phase, leaving the ED as the youngest hospital setting throughout. Hospital non-ED bedside nurses lost senior tenure between cycles (mean 15.65[->]14.06 years since first licensure; 15+ year share 43.5%[->]38.7%), while ED nurses retained their senior tail (mean 11.60[->]12.58). Burnout endorsement rose sharply in both populations (non-ED 27.3%[->]46.0%; ED 34.2%[->]61.2%), with the ED-vs-non-ED gap more than doubling. Controlling for tenure, ED status was not independently associated with burnout in 2018 (OR 1.15, 95% CI 0.83-1.59) but was strongly associated in 2022 (OR 1.92, 95% CI 1.44-2.55; p<.001). The direct burnout item showed a parallel pattern (OR 2.92, 95% CI 1.62-5.28). Conclusions. A pandemic-era setting-specific burnout effect emerged in emergency nursing that workforce-composition controls cannot explain. The 2022 cycle establishes a pre-exit baseline against which the 2026 NSSRN will serve as the falsifiable test of post-Omicron veteran exit. Nursing pipeline replacement lag exceeds the interval before 2026 data arrives; the consequences of inaction fall on populations dependent on ED-based residual access.
Hartlage, C. S.; Manning, E. R.; Bernard, J.; Vaish, S.; Gray, J.; Young, M.; Pestian, T.; Folger, A. T.; Tachinardi, P.; Mendonca, E. A.; Brokamp, C.
Show abstract
Objective: To evaluate whether a locally hosted open-weight large language model (LLM) can extract documented psychosocial factors from pediatric psychiatric intake notes and apply validated extraction to a large emergency psychiatry cohort. Materials and Methods: We identified emergency department presentations at Cincinnati Children's Hospital Medical Center from January 1, 2016, through December 31, 2024, among patients younger than 18 years with psychiatric billing diagnoses. Using full-text intake notes, gpt-oss:120b classified peer conflict, sleep disruption, and school-related academic, attendance, and disciplinary issues as detected, negated, or indeterminate. Four human raters independently reviewed 50 notes. We compared Fleiss' kappa among humans alone versus humans plus the LLM, assessed repeated-query stability across 50 independent calls per note, and applied the workflow to all eligible notes. Results: Among 37,315 eligible admissions, 22,284 had eligible intake notes; 22,270 produced parseable JSON. In detected-versus-not-detected coding, human-plus-LLM reliability did not differ significantly from human-only reliability across measures (human {kappa} 0.71-0.94; human-plus-LLM {kappa} 0.70-0.93). Stability was associated with human agreement: mean LLM-human agreement increased from 42.6% for classifications with less than 80% stability to 82.7% for classifications with 100% stability (Pearson r = 0.36). Full-cohort extraction showed frequent and overlapping documented factors: sleep disruption was most frequently detected (57.7%), followed by peer conflict (47.2%), academic issues (43.4%), disciplinary issues (43.3%), and attendance issues (16.9%). Discussion: Agreement varied by construct and was strongest when repeated model outputs were stable. Conclusion: Locally hosted open-weight LLMs can support scalable structured extraction of documented psychosocial factors from pediatric psychiatric intake notes after local validation.
Corona-Moreno, R.; Acuna-Zegarra, M. A.; Santana-Cibrian, M.; Velasco-Hernandez, J. X.
Show abstract
During the COVID-19 pandemic, limited testing capacity and reporting delays complicated epidemic surveillance and decision-making in Mexico. We calibrated \textit{covidestim}, a Bayesian nowcasting model, to estimate the total SARS-CoV-2 infections from reported cases and deaths using Mexican surveillance data. Disease-progression distribution priors were calibrated using Mexico City records and validated through comparisons with national seroprevalence surveys, hospitalization data, and annual reported severe-case rates across all states. Using the reconstructed estimates of active infections, we implemented an event-based risk framework that quantifies the probability of encountering at least one infectious individual in gatherings of different sizes. This probability was subsequently translated into a four-level epidemiological traffic-light indicator and computed at both state and municipality levels. The resulting estimates revealed substantial spatial heterogeneity that is obscured by state-level aggregation, particularly in states with marked differences between urban and rural municipalities. To evaluate consistency with public-health indicators, we compared the proposed risk classification with the official Mexican epidemiological traffic-light system, considering interpretable gathering sizes relevant to public-health decision making. Weekly reports derived from this framework were delivered to policymakers in the State of Queretaro in Mexico, as an anticipation tool for school reopening and public-space management. This demonstrates that this Bayesian reconstruction of infections combined with event-based risk metrics can provide an interpretable and generalizable municipality-level complement to routine surveillance systems, particularly in regions with limited testing capacity and heterogeneous local transmission dynamics.
King, D. W.; King, P. E.; Blanchard, M. W.; Ning, N. W.; King, S. K.; Grimm, M. C.; Ha, T.; Eagar, K.
Show abstract
Objective To determine if it is possible to assess individual patient risk of the development of colorectal cancer (CRC) in people in high-risk groups due to their family history. Design/Method Retrospective observational study of prospectively collected data from consecutive patients referred for a colonoscopy. 2,478 consecutive patients were referred to a single colorectal surgical practice in Sydney, Australia between 1977 and 2018 for a colonoscopy because of a family history of CRC. Of these, 1,963 have been followed for more than 10 years and are the subject of this paper. Histopathological findings categorised as normal (N), non-advanced adenoma (NAA) or advanced neoplasia (AN) with AN proven to be the precursor to CRC. Intervention Colonoscopic screening on the basis of contemporary practice to 2006 and subsequently according to Australian National Health and Medical Research Council guidelines. Results Participants with normal or low-risk findings in the first decade remain at lower risk of CRC for 30 years from the commencement of screening. Conclusion It is possible to stratify individual patients in a high relative risk cohort into those with high or low personal risk of CRC based on colonoscopic findings in the first 10 years of surveillance. Those with no AN in the first ten years have a lower 30-year risk of developing AN than the general community. This offers the possibility of structuring surveillance programs around individual risk rather than group risk, lessening the need for multiple surveillance colonoscopies in the majority of such patients and improving the cost effectiveness of CRC screening at the population level.
Lewis, S.; Andrews, A.; Laing, H.
Show abstract
Abstract Objectives Value-Based Health Care (VBHC) increasingly guides health system redesign internationally. Despite the increasing availability of VBHC education, gaps remain between health professionals' conceptual understanding of VBHC and their confidence to implement it in practice. This study assessed perceived learning needs and preferences of healthcare professionals across foundational topics essential to VBHC implementation. Design Cross-sectional online survey study Setting and participants The survey was distributed to the global VBHC community and yielded 518 responses. Most respondents were based in the UK and Ireland (51%) and 65% had more than 10 years of experience in the health sector. Participants represented a variety of professional backgrounds, including clinicians (34%), operational or executive managers and leaders (22%), and life sciences or procurement professionals (13%). Primary and secondary outcome measures Primary outcome measures included self-reported interest and confidence across 15 VBHC domains and the magnitude of the gap between them. Secondary outcomes included perceived implementation challenges and preferred VBHC learning approaches, including prior engagement with VBHC-related learning. Results Respondents identified substantial VBHC implementation challenges, including implementing outcome measurement (62.4%), conflicting priorities (57.7%), and resistance to change (56.8%). Interest in all VBHC domains was high (median >= 80/10), while confidence to implement remained substantially lower across most domains (median <=50/100). The largest interest-confidence gaps were observed for reimbursement mechanisms, costing methodology, and overcoming implementation challenges. Interactive learning approaches, including in-person seminars/workshops (55.2%) and online masterclasses (53.9%) were preferred over self-directed formats. Conclusions This international survey identified consistent gaps between health professionals' interest in VBHC and their confidence to implement key VBHC domains in practice. Addressing these gaps through advanced, targeted and contextual education may support more effective and sustainable VBHC implementation in practice.
Hines, A. G.; Mathis, S. M.; Johansson, M. A.; Biggerstaff, M.; Reed, C.; Borchering, R.
Show abstract
Since the U.S. 2013/14 influenza season, the CDC's FluSight Challenge has provided a platform for evaluating influenza forecasting models and fostering collaboration across institutions. The Challenge aims to improve the science and enhance the utility of infectious disease forecasts for public health decision making. We analyzed ten years of submitted forecasts (2014/15-2019/20 (influenza-like illness seasons) and 2021/22-2024/25 (hospital admissions seasons)) across a range of model types, including statistical, mechanistic, machine learning, and hybrid models. Influenza-like illness (ILI) forecasts were evaluated using the exponentiated logarithmic score (skill metric) while hospital admissions forecasts were evaluated using the log transformed relative Weighted Interval Score. Corresponding potential performance differences were assessed using Wilcoxon rank-sum tests, and associations with team participation history were evaluated using Spearman's rank correlation. Model performance varied by season, and no single model type consistently outperformed others. In ILI seasons, statistical models generally performed better than mechanistic and machine learning models, though consistent differences were not observed in more recent hospital admissions seasons. Ensemble forecasts showed better overall performance across seasons, and the CDC's FluSight ensemble ranked among the top-performing forecasts every year. We also found a positive correlation between forecast accuracy and the number of years a team participated in the Challenge, with statistically significant associations in four seasons. These findings highlight the benefits of ensemble approaches and sustained engagement in improving forecasting performance, while also underscoring the continued value of forecast evaluation before and following the COVID-19 pandemic. Insights from the FluSight Challenge can guide future infectious disease forecasting efforts and support more effective public health preparedness.
Charfeddine, N.; Schranz, M.; Schlump, C.; Rupprecht, M.; Ullrich, A.; Diercke, M.; AKTIN Research Group, ; Estupinan Mendez, J.
Show abstract
Background: Mass gathering events (MGEs) are associated with several public health challenges and may cause a strain on healthcare services. Literature findings on the impact of MGEs on emergency departments (EDs) are heterogeneous. Objectives: To examine shifts in ED attendance characteristics during a major sporting tournament, namely the UEFA European Football Championship 2024 held in Germany. Methods: We conducted a retrospective observational study using ED data from the Emergency Department Data Registry. We compared baseline ED attendance characteristics between the tournament and the reference period, defined as two weeks before and two weeks after the tournament, and between Germany game days and non-Germany game days. Hourly attendance patterns were analysed for all Germany games using a reference range. Results: We included data from 41 EDs, totalling 253,493 attendances during the study period. A 1.57% increase in attendance was observed during the tournament compared to the reference period, with baseline characteristics remaining similar. The median daily attendance within all EDs was slightly lower on Germany game days (4066) compared to non-Germany game days (4128). Modest changes were observed in the hourly attendance on Germany game days, most notable during the last Germany game where a decrease in attendance below the reference range extended over three hours. Conclusions: The observed shifts in ED attendance were minimal, suggesting that no major changes of public health relevance occurred in ED attendance during the tournament. We highlight the utility of using ED data for monitoring and for enhancing the understanding of the public health risks and challenges associated with MGEs.
Naderalvojoud, B.; Sutjiadi, B. J.; Koul, A.; Curtin, C.; Gevaert, O.; Hernandez-Boussard, T.
Show abstract
Background Machine learning (ML) models are increasingly used to predict adverse outcomes after surgery. However, most rely on static patient characteristics (e.g., age, comorbidities) and overlook clinician-controlled treatment decisions that can be actively modified at the point of care. Discharge opioid prescribing is a key modifiable, clinician-controlled decision, yet optimizing prescribing choices across multiple adverse outcomes remains underexplored in predictive modeling. This study addresses that gap by introducing a novel ML framework that explicitly separates fixed patient risk factors from modifiable prescribing options to support personalized, risk-informed opioid prescribing decisions. Methods We developed the Hierarchical Clinical Fusion Transformer (HCF-Transformer), an ML model designed to estimate patient-specific risks across four postoperative outcomes: prolonged opioid use (POU), chronic pain (CP), 30-day readmission, and opioid-associated outcomes (OAO). The model constructs patient risk profiles from fixed, non-modifiable baseline factors, followed by a transformer layer. Clinician-controllable discharge opioid regimens are modeled as alternative intervention candidates and fused with the fixed risk representation through a clinical fusion mechanism, enabling assessment and ranking based on predicted risks. A Total Relative Risk (TRR) metric, calibrated to each outcome prediction threshold, guides the recommendation process. We evaluated the model in diabetic surgical patients, a common high-risk population. Results The study included 157,853 unique diabetic surgical patients, with outcome prevalences ranging from 47.2% (POU) to 1.8% (OAO). The HCF-Transformer achieved the highest AUROCs, 0.798 for POU, 0.712 for 30-day readmission, 0.808 for CP, and 0.922 for OAO, outperforming Random Forest, FT-Transformer, and ResNet-based models. Compared to these baselines, HCF-Transformer generated more stable and discriminative risk estimates and demonstrated significant variation in TRR scores across discharge opioid options (ANOVA p < .01, eta-squared > .01). This enabled consistent identification of lower-risk regimens tailored to patient-specific profiles. Conclusions The HCF-Transformer introduces a novel hierarchical fusion approach to optimize opioid prescribing by integrating static patient risk profiles with modifiable discharge options. Using transformer-based modeling and a quantifiable TRR metric, the model delivers personalized, risk-aware recommendations. This approach enables data-driven opioid prescribing tailored to individual risk and has the potential to improve postoperative outcomes in high-risk populations. Our findings demonstrate that integrating modifiable factors with structured risk profiles through a transformer-based fusion architecture can enhance decision-support systems, paving the way for more actionable and personalized AI in healthcare.
Seidel, A.; Steiger, E.; Schuster, J.; Kroll, L. E.
Show abstract
Background: Digital decision-support tools such as triage systems and symptom checkers support millions of health-related decisions each year. Their quality and safety are commonly evaluated using textual patient cases, known as case vignettes. However, existing vignette sets written by medical experts cover only a limited spectrum of real-world patient presentations and lack population weights, which would allow extrapolating evaluation results to the underlying patient population. Objective: This study aims to develop a data-driven framework for automatically generating a human-manageable set of case vignettes from nationwide triage data that captures broad presentation diversity and links each vignette to a quantitative weight reflecting the number of underlying patient assessments. Methods: From 3.2 million triage assessments conducted over one year using structured triage software in the German medical on-call service (telephone triage and online self-triage) and at the joint contact points of the outpatient emergency care service and hospital emergency departments, we randomly sampled 50,000 cases. Triage questionnaires were converted into semantic embeddings using a German Sentence Transformer Model and grouped by agglomerative clustering. For clusters containing sufficient assessments, we generated one representative assessment using a two-phase simulated-annealing optimization. The optimization minimized the distance to the cluster centroid while maximizing the number of answered triage questions, aiming for high representativeness and information content. Each representative assessment was assigned the size of its source cluster as its sample-based weight. A similarity-based sensitivity analysis was performed to examine whether these weights were preserved in the full 1-year population. Finally, the question-answer pairs of the representative assessments were converted into structured textual case vignettes using controlled prompting of a large language model. Results: The cluster analysis yielded 514 included clusters covering 96.8% of the sampled 50,000 assessments. The generated representatives showed strong agreement with the majority treatment-urgency recommendation of their source cluster (Spearman's {rho}=0.78, p<0.001) and contained on average 4.3 more answered triage questions than the original assessments within their clusters. When weighted by cluster size, the representatives approximated the sample distributions of treatment urgency, demographics, and symptoms, although some systematic deviations remained, most notably an overrepresentation of female cases (+13.5%), patients aged 14-49 years (+8.0%), and the urgency category "As soon as possible" (+6.6%). Of 121 recorded symptoms, 101 (83.5%) were covered by the representatives; the rest each occurred in <0.5% of the sample. In a sensitivity analysis, cluster-based vignette weights were strongly correlated with similarity-based population weights (Spearman's {rho}=0.77, p<0.001), and 90.1% of assessments in the full 1-year population were matched to at least one vignette. Conclusions: We present a data-driven framework for deriving a manageable set of population-weighted case vignettes from nationwide triage data. The resulting vignettes captured broad presentation diversity, approximated key sample characteristics, and provided an explicit quantitative link to the number of underlying patient assessments. After medical expert review and refinement, the vignettes may support more population-aware evaluation and quality assurance of digital decision-support tools.
Kim, D.; Pasco, R.; Johnson, K. E.; Fox, S. J.; Reich, N. G.; Meyers, L. A.
Show abstract
Accurate outbreak forecasts are critical for timely and effective public health response. In the United States, however, most forecasts are produced at the state level, which can mask substantial sub-state heterogeneity and limit their utility for local planning. We generated and evaluated forecasts of the percentage of Emergency Department visits attributable to influenza across 173 large metropolitan Health Service Areas (HSAs) using a gradient boosting quantile regression (GBQR) model, and compared their accuracy to forecasts derived from state-level data alone. At a one-week, two-week and three-week horizon, local forecasts outperformed state-based forecasts in 98.8%, 90.8%, and 78.6% of HSAs, respectively, achieving mean weighted interval scores that were on average a 39.2% lower (95% range: 5.9% to 76.7%), 19.6% lower (-6.3% to 59.5%) , and 11.4% lower (-11.7% to 44.9%), respectively. The performance advantage of local forecasting was strongest in HSAs representing a smaller share of their state's population and increased with the proportion of the HSA population living in urban areas and the number of metropolitan areas within a state. These results, based on an analysis of HSAs with populations greater than 250,000, demonstrate that fine-scale modeling can substantially improve forecast accuracy and highlight the potential value of local forecasts for outbreak preparedness and response.
Mahmud, S.
Show abstract
Background Bangladesh has experienced a rapid increase in cesarean section (CS) utilization over the past two decades. While previous studies have documented socioeconomic disparities in CS use, evidence on how wealth-related inequalities differ between public and private healthcare facilities remains limited. This study assessed the magnitude and drivers of socioeconomic inequality in CS utilization among facility-based births in Bangladesh. Methods We analyzed data from 3,008 facility-based births reported in the 2022 Bangladesh Demographic and Health Survey (BDHS). Survey-weighted multivariable logistic regression was used to identify factors associated with CS utilization. Wealth-related inequality was assessed using concentration curves and the Erreygers-corrected concentration index (ECCI). Regression-based decomposition of the standard concentration index was performed to quantify the contribution of socioeconomic, demographic, and healthcare-related factors to observed inequalities overall and separately for public and private facilities. Results Overall, 71.2% of facility-based births were delivered by CS, with substantially higher prevalence in private facilities (84.2%) than in public facilities (35.9%). Women delivering in private facilities had markedly higher odds of CS than those delivering in public facilities (adjusted odds ratio [AOR]: 9.07; 95% confidence interval [CI]: 7.17-11.47). Significant pro-rich inequality was observed overall (ECCI: 0.154; 95% CI: 0.117-0.191), with inequality substantially greater in public facilities (ECCI: 0.189; 95% CI: 0.114-0.264) than in private facilities (ECCI: 0.049; 95% CI: 0.014-0.084). Decomposition analysis showed that household wealth was the dominant contributor to inequality, particularly the richest wealth quintile, accounting for 81.5% of overall inequality, 63.8% in public facilities, and 109.7% in private facilities. Conclusions Wealth-related inequalities in CS utilization remain substantial in Bangladesh despite widespread use of the procedure. Although pro-rich inequality exists across both sectors, inequality is considerably greater in public facilities and is driven by different mechanisms across facility types. Policies should simultaneously improve equitable access to medically necessary CS and reduce unnecessary procedures, particularly within the private sector.
Collier, A.
Show abstract
Background Electronic health record documentation patterns may reflect workflow complexity, monitoring intensity, and operational strain in intensive care settings. However, documentation-derived features can be sensitive to local documentation culture, data capture systems, and outcome definitions. Retrospective validation across multiple datasets is therefore needed before these signals are used in workflow intelligence or clinical AI governance tools. Objective To evaluate whether documentation-density and documentation-timing features show reproducible retrospective signal for ICU workflow complexity and long-stay proxy outcomes across de-identified critical care datasets, while distinguishing workflow and long-stay associations from unsupported claims about mortality prediction, burden reduction, or deployment readiness. Methods We synthesized retrospective validation results from de-identified ICU and workflow datasets generated through a prespecified documentation-density validation program. Feature families included Documentation Burden Score style features, Shift-End Documentation Rate style features, documentation reliability style metadata, and all-documentation feature sets where available. Outcomes included long ICU length of stay proxies, mortality where available, and workflow proxy endpoints. Models compared baseline feature sets with enhanced models containing documentation-density or workflow features. Performance was summarized using area under the receiver operating characteristic curve, Brier score where reported, delta AUROC, bootstrap confidence intervals where reported, and label-shuffle controls where available. Results The strongest external long-stay proxy evidence came from the NWICU chartevents analysis, which included 28,612 ICU stays, 20,267 stays with chart events, and 9,619,759 chart events. For ICU length of stay greater than the median, baseline AUROC was 0.5252. Enhanced AUROC was 0.9512 for Documentation Burden Score features, 0.9214 for Shift-End Documentation Rate features, 0.8470 for documentation reliability style features, and 0.9517 for all documentation features. Corresponding label-shuffle enhanced AUROCs were near random, ranging from 0.4897 to 0.5064. For ICU length of stay greater than the 75th percentile, baseline AUROC was 0.5155. Enhanced AUROC was 0.9433 for Documentation Burden Score features, 0.9194 for Shift-End Documentation Rate features, 0.8118 for documentation reliability style features, and 0.9427 for all documentation features, with label-shuffle enhanced AUROCs from 0.4836 to 0.4999. Additional retrospective support was observed in eICU workflow analyses, HiRID first-24-hour documentation-density analyses, MIMIC-IV HF ICU internal analyses, MIMIC-IV-Note metadata extensions, and nursing-chart or lab density proxy analyses. However, cross-institution discrimination transfer was weak without recalibration, and several analyses remained proxy validations rather than final clinical validations. Conclusions Documentation-density and documentation-timing features show promising retrospective signal for ICU workflow complexity and long-stay proxy outcomes, especially in NWICU chartevents and selected internal dataset-specific analyses. These findings support further preregistered, prospective, silent-mode validation of documentation-derived workflow intelligence. They do not establish prospective clinical performance, mortality reduction, clinician burden reduction, autonomous deterioration prediction, or deployment readiness.
Shah, K. P.; Airan Javia, S.; Savage, T.; Bressman, E.
Show abstract
End-of-rotation handoffs are critical for patient safety but add to documentation burden for hospitalists. Generative artificial intelligence (AI) may help automate handoff creation using electronic health record data, but its impact on quality and safety is unclear. Methods: We developed an AI handoff tool with a large language model using clinical notes as input and conducted a retrospective evaluation comparing AI-generated and clinician-authored handoffs. Handoffs were assessed across domains of quality and safety through a structured review. Results: Quality ratings were similar between AI and human handoffs (3.7 vs. 3.5, p=0.57). AI-generated handoffs were rated higher for organization (4.4 vs. 4.1, p=0.05) and completeness (4.1 vs. 3.6, p=0.01), but lower for conciseness (3.7 vs. 4.1, p=0.03) and accuracy (4.1 vs. 4.4, p=0.03). Error rates were comparable (0.3/handoff in both groups); however, AI-generated handoffs included inaccuracies (9% of AI errors) and hallucinations (1% of AI errors), while clinician-authored handoffs contained only omissions. Conclusion: Human and AI handoffs have differing error profiles and tradeoffs between completeness and conciseness. Prospective evaluation in clinical workflows is underway.
Ramzy, L. M.; Rahman, M.; Luque, M. O.; Rodrigues, K. K.; Belknap, R.; Venci, J. A.; Francis, B.; Ruckard, B. J.; Moran-Ibarra, W.; Rasulo, R. M.; Matadi, A.; Ramirez, M. G.; Thee, P. S.; McFeron, H. D.; Monson, S. P.; For the Tuberculosis Epidemiologic Studies Consortium,
Show abstract
Purpose: The purpose of this study was to examine the barriers and facilitators experienced by non-U.S. born persons during the diagnosis and treatment of latent tuberculosis infection (LTBI) in primary care settings, including the impact of culturally and linguistically congruent care navigation. Design: 25 interviews with non-U.S. born patients, along with focus groups and surveys with 31 primary care team members and leadership, were conducted. Setting: The study was conducted within a network of Federally Qualified Health Center (FQHC) clinics. Participants: Participants were adult non-U.S. born patients with LTBI and FQHC care team members. A purposefully selected subsample of randomized participants was interviewed. Intervention: Care navigators followed participants randomized to receive care navigation after a positive test for tuberculosis (TB) infection and offered health navigation and education about the importance of TB screening and treatment. Method: Data collection was followed by thematic analysis guided by a critical ideological paradigm. Results: Culturally and linguistically congruent navigation emerged as central to potentially reducing barriers, fostering trust, and improving treatment continuity. Participants without navigation support reported confusion and disengagement from care, while those with culturally aligned navigators described clarity and comfort, with influence overall by intrinsic motivation, relational support, and culturally shaped beliefs about care. Conclusion: Care navigation that includes culturally and linguistically congruent navigators whenever possible may help increase LTBI treatment completion among non-U.S. born populations. Limitations of the study include the potential influence of cultural norms, power dynamics, and selection bias.
Tukamuhebwa, P. M.; Nuwabaine, L.
Show abstract
Abstract Background Evaluating antenatal care (ANC) quality is critical to reducing maternal and neonatal mortality. In Zambia, despite high basic ANC attendance, comprehensive national evidence on the clinical content and quality of services remains limited. This study assessed the coverage of WHO-recommended ANC interventions and identified factors associated with care quality using the latest national data. Methods A cross-sectional analysis was conducted using data from the 2024 Zambia Demographic and Health Survey. The final analytic sample comprised 4,829 women aged 15-49 with a live birth in the preceding 5 years. A composite index of 15 selected, equally weighted WHO-recommended components evaluated clinical assessment, counseling/screening, preventive interventions, and utilization. Survey-weighted Poisson regression estimated adjusted incidence rate ratios (aIRRs) for the count of ANC components received. Results The mean ANC quality score was 12.5 out of 15 (95% CI: 12.4-12.6), and 78.5% (95% CI: 77.0-80.0) of women achieved adequate ANC ([≥] 12/15 components). While individual clinical and counseling coverage generally exceeded 90%, only 47.2% (95% CI: 45.3-49.0) of women initiated care during the first trimester, and just 4.8% (95% CI: 4.1-5.6) achieved [≥] 8 ANC contacts. Maternal education was the strongest and most stable predictor of quality across all models. Compared to no education, higher education was associated with an 8.0% higher expected quality score (aIRR = 1.080, 95% CI: 1.051-1.110). Lower ANC quality was significantly associated with unwanted pregnancies (aIRR = 0.970, 95% CI: 0.956-0.993) and with residence in Western (aIRR = 0.923, 95% CI: 0.897-0.951) and North Western (aIRR = 0.966, 95% CI: 0.937-0.996) provinces. Absence of distance barriers and residence in Eastern, Luapula, and Copperbelt provinces were associated with higher quality scores. Conclusion While average ANC component coverage in Zambia is high, critical gaps persist in early initiation and total contact frequency. Care adequacy is strongly influenced by maternal education, relationship status, pregnancy intention, and regional inequities. These findings underscore the need for interventions targeted at uneducated women, preventing unintended pregnancies, and underserved regions such as Western and North Western Provinces. Keywords: Antenatal care quality, ANC content, Zambia, maternal education.
Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.
Show abstract
Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.